Spark 4.1 | 4.0 | 3.5 | 3.4: Fail publish_changes procedure if there's more than one matching snapshot by SamWheating · Pull Request #14955 · apache/iceberg

SamWheating · 2026-01-02T22:26:14Z

Closes: #14953 - see this issue for a larger description and reproduction.

Its assumed that wap.id will be unique among snapshots, but this doesn't appear to be enforced anywhere which can lead to unexpected results when only the first write is actually published.

This PR updates the publish_changes procedure to fail when multiple matching snapshots are identified.

If this change is approved I will backport it to the other spark versions.

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/procedures/PublishChangesProcedure.java

singhpk234 · 2026-01-03T00:20:10Z

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/procedures/PublishChangesProcedure.java

+              throw new ValidationException(
+                  "Cannot apply non-unique WAP ID. Found %d snapshots with WAP ID '%s'",
+                  numMatchingSnapshots, wapId);


I wonder if we should not allow this situation like 2 snapshots with same wap id to happen in the first place, during the snapshot creating time ?

I don't have a strong opinion here, but this might be considered a more significant / potentially breaking change? Technically having a duplicate WAP ID doesn't cause any problems until they are cherry-picked into main.

Do you think there might be legitimate uses for staging multiple changes under the same WAP ID? For example:

staging multiple changes, evaluating all of them separately and then deleting all but one before committing.

creating staged snapshots which are never intended to be published (for testing / evaluation / etc)

I am not super familiar with the original designs behind WAP in iceberg, I'll look through older commits to see if there's any mention of a uniqueness constraint.

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/procedures/PublishChangesProcedure.java

singhpk234

LGTM, thanks @SamWheating !

Lets give it sometime before we check it in, incase someother folks have feedbacks on this.

SamWheating · 2026-01-08T18:56:59Z

Thanks @singhpk234 !

Whats the preferred approach for applying this change to previous spark versions? Should I wait until this is approved and merged before creating a single backport PR for all of them?

huaxingao · 2026-01-19T03:39:35Z

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/procedures/PublishChangesProcedure.java

-          if (!wapSnapshot.isPresent()) {
-            throw new ValidationException("Cannot apply unknown WAP ID '%s'", wapId);
+          Iterable<Snapshot> wapSnapshots =
+              Iterables.filter(


nit: Instead of filtering all matching snapshots, could we scan table.snapshots() once and fail as soon as we see a 2nd match (avoid full-history scan)?

This is a good point, I've rewritten the procedure to early-exit on the first conflicting snapshot.

huaxingao · 2026-01-19T03:41:13Z

Since this changes publish_changes from “first match wins” to failing when wap.id is ambiguous, should we document this so upgrades don’t surprise users?

SamWheating · 2026-01-19T17:37:27Z

Since this changes publish_changes from “first match wins” to failing when wap.id is ambiguous, should we document this so upgrades don’t surprise users?

Definitely, but in that case we should also ensure that all of the different spark distributions are also updated to be consistent with the docs (not just 4.1). ~~I will backport this change to the other versions, update the docs and re-request review.~~

Actually maybe I should get some feedback on this code before I replicated it into 4 different places 😂 If the code + doc changes look good I will add the backports.

SamWheating · 2026-01-19T19:21:39Z

@huaxingao could you take a look at the updated procedure and let me know what you think? If this looks good I will make another commit to backport the procedure change to the other distributions.

docs/docs/spark-procedures.md

huaxingao

LGTM

SamWheating · 2026-01-27T19:40:29Z

Thanks for the reviews @huaxingao and @singhpk234, I have backported the fix to other spark versions now so everything should be consistent with the updated documentation.

Let me know if there's anything else I can do to help get this merged!

huaxingao · 2026-01-28T18:46:14Z

@SamWheating I only checked 4.1. Are the back-porting to 3.4, 3.5, 4.0 clean back-porting?

SamWheating · 2026-01-29T23:28:04Z

Yes, the return type on the call() method in 3.x and 4.x is different, but other than that the change is identical.

Changes to tests are identical between versions and all versions passed.

huaxingao · 2026-01-29T23:52:44Z

Thanks @SamWheating for the PR! Thanks @singhpk234 for the review!

SamWheating force-pushed the sw-fail-publish-changes-on-duplicate-wap-id branch from 0699a5f to 7ed4b84 Compare January 2, 2026 22:27

github-actions bot added the spark label Jan 2, 2026

SamWheating changed the title ~~Fail publish_changes procedure if there's more than one matching snapshot~~ Spark: Fail publish_changes procedure if there's more than one matching snapshot Jan 2, 2026

singhpk234 reviewed Jan 3, 2026

View reviewed changes

Fail publish_changes procedure if there's multiple matching snapshots

df314e9

SamWheating force-pushed the sw-fail-publish-changes-on-duplicate-wap-id branch from 7ed4b84 to df314e9 Compare January 3, 2026 00:47

manuzhang changed the title ~~Spark: Fail publish_changes procedure if there's more than one matching snapshot~~ Spark 4.1: Fail publish_changes procedure if there's more than one matching snapshot Jan 5, 2026

SamWheating requested a review from singhpk234 January 6, 2026 17:33

singhpk234 approved these changes Jan 8, 2026

View reviewed changes

huaxingao reviewed Jan 19, 2026

View reviewed changes

SamWheating added 2 commits January 19, 2026 11:18

rewrite publish_changes procedure to early-exit on duplicated wap.id

8acad33

Update docs for publish_changes procedure

98ecb30

github-actions bot added the docs label Jan 19, 2026

SamWheating requested a review from huaxingao January 19, 2026 19:20

run spotlessApply

115b302

huaxingao reviewed Jan 23, 2026

View reviewed changes

docs/docs/spark-procedures.md Outdated Show resolved Hide resolved

Update docs/docs/spark-procedures.md

3b6421e

huaxingao approved these changes Jan 26, 2026

View reviewed changes

backport fix to spark 3.4, 3.5, 4.0

c6502ef

SamWheating changed the title ~~Spark 4.1: Fail publish_changes procedure if there's more than one matching snapshot~~ Spark 4.1 | 4.0 | 3.5 | 3.4: Fail publish_changes procedure if there's more than one matching snapshot Jan 27, 2026

huaxingao merged commit a55d123 into apache:main Jan 29, 2026
24 checks passed

Conversation

SamWheating commented Jan 2, 2026

Uh oh!

Uh oh!

singhpk234 Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

SamWheating Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

singhpk234 left a comment

Choose a reason for hiding this comment

Uh oh!

SamWheating commented Jan 8, 2026

Uh oh!

huaxingao Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

SamWheating Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

huaxingao commented Jan 19, 2026

Uh oh!

SamWheating commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamWheating commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

huaxingao left a comment

Choose a reason for hiding this comment

Uh oh!

SamWheating commented Jan 27, 2026

Uh oh!

huaxingao commented Jan 28, 2026

Uh oh!

SamWheating commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

huaxingao commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SamWheating Jan 3, 2026 •

edited

Loading

SamWheating commented Jan 19, 2026 •

edited

Loading

SamWheating commented Jan 19, 2026 •

edited

Loading

SamWheating commented Jan 29, 2026 •

edited

Loading